Below is the concentration of PM2.5 in units of micrograms per meter cubed from 2015 to 2017. Higher concentrations are seen in Oakland and Vallejo.
Below is the age-adjusted rate of ED visits for asthma per 10,000 (averaged over 2015-2017). It appears that the most visits were in Vallejo and San Leandro area.
The best fit of this regression is okay, however it looks like the data points are very squished.
Variation in PM2.5 explains 9.6% of the variation in Asthma. For every 1 increase in PM2.5, I predict a 116 increase in Asthma”
##
## Call:
## lm(formula = Asthma ~ PM2.5, data = ces4_bay)
##
## Residuals:
## Min 1Q Median 3Q Max
## -54.47 -25.89 -9.61 12.94 182.95
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -116.278 13.040 -8.917 <2e-16 ***
## PM2.5 19.862 1.534 12.950 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 37.49 on 1578 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.09606, Adjusted R-squared: 0.09549
## F-statistic: 167.7 on 1 and 1578 DF, p-value: < 2.2e-16
From the textbook, “regression models are only valid if a set of conditions are true about the data, including that the mean of the residuals is ~ 0, and that the residuals are normally distributed.” Here the residuals are very much not centered at 0.
Here the data looks a little more symmetrical and the best fit line looks like it accurately represents the data points.
Variation in PM2.5 explains 10% of the variation in Asthma. For every 1 increase in PM2.5, I predict a doubling of Asthma”
##
## Call:
## lm(formula = log(Asthma) ~ PM2.5, data = ces4_bay)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.00402 -0.46479 0.03313 0.42298 1.75525
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.69234 0.22840 3.031 0.00248 **
## PM2.5 0.35633 0.02686 13.264 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6566 on 1578 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.1003, Adjusted R-squared: 0.09974
## F-statistic: 175.9 on 1 and 1578 DF, p-value: < 2.2e-16
The residuals are more normally distributed and we can see that this log transformation is better suited towards the data.
A negative residual in the context of Asthma estimation means that the actual value was less than the predicted value, so the person is doing better with Asthma than predicted. The region with the most negative residual is Stanford. I think this is the case because we are only using a 3 year sample and are predicting a high number of diverse students with varying health concerns having issues. This is not really the case, exacerbated by the fact that students tend to graduate and move out fairly frequently.